695 research outputs found

    On feature selection protocols for very low-sample-size data

    Get PDF
    High-dimensional data with very few instances are typical in many application domains. Selecting a highly discriminative subset of the original features is often the main interest of the end user. The widely-used feature selection protocol for such type of data consists of two steps. First, features are selected from the data (possibly through cross-validation), and, second, a cross-validation protocol is applied to test a classifier using the selected features. The selected feature set and the testing accuracy are then returned to the user. For the lack of a better option, the same low-sample-size dataset is used in both steps. Questioning the validity of this protocol, we carried out an experiment using 24 high-dimensional datasets, three feature selection methods and five classifier models. We found that the accuracy returned by the above protocol is heavily biased, and therefore propose an alternative protocol which avoids the contamination by including both steps in a single cross-validation loop. Statistical tests verify that the classification accuracy returned by the proper protocol is significantly closer to the true accuracy (estimated from an independent testing set) compared to that returned by the currently favoured protocol.project RPG-2015-188 funded by The Leverhulme Trust, UK and by project TIN2015-67534-P (MINECO/FEDER, UE) funded by the Ministerio de EconomĂ­a y Competitividad of the Spanish Government and European Union FEDER fund

    Combining univariate approaches for ensemble change detection in multivariate data

    Get PDF
    Detecting change in multivariate data is a challenging problem, especially when class labels are not available. There is a large body of research on univariate change detection, notably in control charts developed originally for engineering applications. We evaluate univariate change detection approaches —including those in the MOA framework — built into ensembles where each member observes a feature in the input space of an unsupervised change detection problem. We present a comparison between the ensemble combinations and three established ‘pure’ multivariate approaches over 96 data sets, and a case study on the KDD Cup 1999 network intrusion detection dataset. We found that ensemble combination of univariate methods consistently outperformed multivariate methods on the four experimental metrics.project RPG-2015-188 funded by The Leverhulme Trust, UK; Spanish Ministry of Economy and Competitiveness through project TIN 2015-67534-P and the Spanish Ministry of Education, Culture and Sport through Mobility Grant PRX16/00495. The 96 datasets were originally curated for use in the work of Fernández-Delgado et al. [53] and accessed from the personal web page of the author5. The KDD Cup 1999 dataset used in the case study was accessed from the UCI Machine Learning Repository [10

    Restricted set classification: Who is there?

    Get PDF
    We consider a problem where a set X of N objects (instances) coming from c classes have to be classified simultaneously. A restriction is imposed on X in that the maximum possible number of objects from each class is known, hence we dubbed the problem who-is-there? We compare three approaches to this problem: (1) independent classification whereby each object is labelled in the class with the largest posterior probability; (2) a greedy approach which enforces the restriction; and (3) a theoretical approach which, in addition, maximises the likelihood of the label assignment, implemented through the Hungarian assignment algorithm. Our experimental study consists of two parts. The first part includes a custom-made chess data set where the pieces on the chess board must be recognised together from an image of the board. In the second part, we simulate the restricted set classification scenario using 96 datasets from a recently collated repository (University of Santiago de Compostela, USC). Our results show that the proposed approach (3) outperforms approaches (1) and (2).Spanish Ministry of Economy and Competitiveness through project TIN 2015-67534-

    When is resampling beneficial for feature selection with imbalanced wide data?

    Get PDF
    This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests. Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS.“La Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was also supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds

    Lifelong Learning from Sustainable Education: An Analysis with Eye Tracking and Data Mining Techniques

    Get PDF
    The use of learning environments that apply Advanced Learning Technologies (ALTs) and Self-Regulated Learning (SRL) is increasingly frequent. In this study, eye-tracking technology was used to analyze scan-path differences in a History of Art learning task. The study involved 36 participants (students versus university teachers with and without previous knowledge). The scan-paths were registered during the viewing of video based on SRL. Subsequently, the participants were asked to solve a crossword puzzle, and relevant vs. non-relevant Areas of Interest (AOI) were defined. Conventional statistical techniques (ANCOVA) and data mining techniques (string-edit methods and k-means clustering) were applied. The former only detected differences for the crossword puzzle. However, the latter, with the Uniform Distance model, detected the participants with the most effective scan-path. The use of this technique successfully predicted 64.9% of the variance in learning results. The contribution of this study is to analyze the teaching–learning process with resources that allow a personalized response to each learner, understanding education as a right throughout life from a sustainable perspective.uropean Project “Self-Regulated Learning in SmartArt” 2019-1-ES01-KA204-065615 and the Research Funding Program (Funding of dissemination of research results, 2020) of the Vice-Rectorate for Research and Knowledge Transfer of the University of Burgos to the Recognized Investigation Group DATAHES

    Approx-SMOTE: Fast SMOTE for Big Data on Apache Spark

    Get PDF
    One of the main goals of Big Data research, is to find new data mining methods that are able to process large amounts of data in acceptable times. In Big Data classification, as in traditional classification, class imbalance is a common problem that must be addressed, in the case of Big Data also looking for a solution that can be applied in an acceptable execution time. In this paper we present Approx-SMOTE, a parallel implementation of the SMOTE algorithm for the Apache Spark framework. The key difference with the original SMOTE, besides parallelism, is that it uses an approximated version of k-Nearest Neighbor which makes it highly scalable. Although an implementation of SMOTE for Big Data already exists (SMOTE-BD), it uses an exact Nearest Neighbor search, which does not make it entirely scalable. Approx-SMOTE on the other hand is able to achieve up to 30 times faster run times without sacrificing the improved classification performance offered by the original SMOTE.“La Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was supported by the Junta de Castilla y León under project BU055P20 and by the Ministry of Science and Innovation of Spain under project PID2020-119894 GB-I00, co-financed through European Union FEDER funds. It also was supported through Consejería de Educación of the Junta de Castilla y León and the European Social Fund through a pre-doctoral grant (EDU/1100/2017). This material is based upon work supported by Google Cloud

    The Antioxidant Activity of Thymus serpyllum Extract Protects against the Inflammatory State and Modulates Gut Dysbiosis in Diet-Induced Obesity in Mice

    Get PDF
    Nowadays, there is an increasing interest in alternative therapies in the treatment of metabolic syndrome that combine efficacy and safety profiles. Therefore, this study aimed to evaluate the effect of an extract of Thymus serpyllum, containing rosmarinic acid, on high-fat diet (HFD)- induced obesity mice, highlighting the impact of its antioxidant activity on the inflammatory status and gut dysbiosis. The extract was administered daily (50, 100 and 150 mg/kg) in HFD-fed mice. The treatment reduced body weight gain, glucose and lipid metabolic profiles. Moreover, the extract ameliorated the inflammatory status, with the c-Jun N-terminal kinases (JUNK) pathway being involved, and showed a significant antioxidant effect by the reduction of radical scavenging activity and the mitigation of lipid peroxidation. Moreover, the extract was able to modulate the altered gut microbiota, restoring microbial richness and diversity, and augmenting the counts of short-chain fatty acid producing bacteria, which have been associated with the maintenance of gut permeability and weight regulation. In conclusion, the antioxidant activity of Thymus serpyllum extract displayed a positive impact on obesity and its metabolic alterations, also reducing systemic inflammation. These effects may be mediated by modulation of the gut microbiota.Junta de Andalucia CTS 164Instituto de Salud Carlos III European Commission PI19.01058Spanish Government AGL201567995-C3-3-REuropean CommissionInstituto de Salud Carlos II

    Estimation of time dedication to a pathophysiology practice

    Get PDF
    ResĂșmenes IV Congreso VetDoc de Docencia Veterinaria, LeĂłn 2017 (6-7 de Julio)[ES] El objetivo de este trabajo fue comprobar si la carga lectiva real de la prĂĄctica “estudio de caso” correspondiente a la parte de fisiopatologĂ­a se correlacionaba adecuadamente con la establecida en el plan docente

    Probiotic and Functional Properties of Limosilactobacillus reuteri INIA P572

    Get PDF
    Limosilactobacillus reuteri INIA P572 is a strain able to produce the antimicrobial compound reuterin in dairy products, exhibiting a protective effect against some food-borne pathogens. In this study, we investigated some probiotic properties of this strain such as resistance to gastrointestinal passage or to colonic conditions, reuterin production in a colonic environment, and immunomodulatory activity, using different in vitro and in vivo models. The results showed a high resistance of this strain to gastrointestinal conditions, as well as capacity to grow and produce reuterin in a human colonic model. Although the in vitro assays using the RAW 264.7 macrophage cell line did not demonstrate direct immunomodulatory properties, the in vivo assays using a Dextran Sulphate Sodium (DSS)-induced colitic mice model showed clear immunomodulatory and protective effects of this strain.This work was supported by project no. RTA2017-00002-00-00 from the Spanish Ministry of Science and Innovation, by the Junta de AndalucĂ­a (CTS 164) and Instituto de Salud Carlos III (PI19/01058) with funds from the European Union.Ye
    • 

    corecore